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[57] ABSTRACT 

A 3-by-3 convolver utilizes 9 binary arithmetic units 
(10) connected in cascade for multiplying 12-bit binary 
pixel values p, which are positive or two’s complement 
binary numbers by 5-bit magnitude (plus sign) weights 
w/ which may be positive or negative. The weights are 
stored in registers (13, 14 and 15) including the sign bits 
(shown separately for convenience). For a negative 
weight, the one’s complement of the pixel value to be 
multiplied is formed at each unit by a bank of exclusive 
OR gates under control of the sign of the corresponding 
weight w/, and a correction is made by adding the sum 
of the absolute values of all the negative weights for 
each 3x3 kernel. Since this correction value remains 
constant as long as the weights are constant, it can be 
precomputed and stored in a register (16) as a value to 
be added to the product PW of the first arithmetic unit. 

10 Oaims, 2 Drawing Sheets 





















4,750,144 







































U.S. Patent Jun. 7, 1988 


Sheet 2 of 2 4,750,144 


REGISTERS 

Multiplicand 

Multiplier 

S = I |w i | or PW+S 000 
STEPS 

1 . Enter S 000 

2. Add Pj. 0 0 0 

Shift p i 

3. Add Zero 000 

Shift p A 

0. Add p i 001 

Shift p i 1 

5. Add p A 010 


FIG. 


1 

0 

1 

0 

Input 

1 

1 

0 

1 

Input 

1 0 0 

1 

1 

0 

Input 


100110 Accumulator 
10 10 

110000 Accumulator 
10 10 


1 1 0 0 0 0 
10 10 


0 110 0 0 
0 1 0 


1 01 000 Final Product 
PW+S Output 

2 



1 


4 , 750,144 


REAL TIME PIPELINED SYSTEM FOR FORMING 
THE SUM OF PRODUCTS IN THE PROCESSING 
OF VIDEO DATA 

5 

ORIGIN OF INVENTION 

The invention described herein was made in the per- 
formance of work under a NASA contract, and is sub- 
ject to the provisions of Public Law 96-517 (35 USC 
202) in which the Contractor has elected not to retain 10 
title. 

BACKGROUND OF THE INVENTION 

This invention relates to a convolver, namely a sys- 
tem comprising an arithmetic unit for carrying out the 15 
operation defined by the following equation: 


Zpm= 2 J/IM + 2 Pi\wt\ + 2 |NV| 

I H , />0 KV<0 KV<0 


20 


This equation is explained and illustrated for the case of 
k=9, i.e., for a 3-by-3 moving window (kernel) of video 
data and more specifically to a real time pipelined con- 
volver for forming the sum of products p/w„ wherein i 2 5 
is a number from 1 to n 2 and n is a number that defines 
the size of an n-by-n kernel, p / are the pixel values, 
typically 12-bit values of an n-by-n kernel, w, are the 
convolver weights that may have positive (w/>0) and 
negative (w/<0) values. The pixel values and weights 3Q 
are represented by absolute value binary numbers and a 
sign bit. 

In the processing of video data, it is necessary to 
produce the sum of products of fixed weights w / times 
the corresponding pixel values p / of successive rows in 35 
an n-by-n kernel, such as a 3-by-3 kernel. Examination 
of the typical weights involved in low-level vision ap- 
plications indicates that small positive or negative inte- 
gers are most common, with the ratio of the smallest to 
the largest weight being usually less than 20. Conse- ^ 
quently, each weight contains six bits consisting of a 
sign bit and five bits for magnitude from zero to 3 1 . The 
sign bit expands the range of weight values from — 3 1 to 
+ 31. 

To prevent the data path from overflowing, it is nec- ^ 
essary to scale the output of the convolver. Scaling is 
accomplished by shifting down the data one or more 
bits, i.e., dividing by some power of two. This is most 
easily done by switches selecting the parallel output, 
i.e., by selecting the output from one or more bit posi- 
tions to the right. The problem is in the requisite hard- 
ware for multiplying the pixel values p/by positive and 
negative weights w / and forming the sum of the prod- 
ucts. 

SUMMARY OF THE INVENTION 55 

In accordance with the present invention, a con- 
volver is comprised of N — 1 buffers for storing N — 1 
rasters (scan lines) of pixel values for an n-by-n kernel. 
The necessary multiplications and additions are carried 
out in arithmetic operating units connected in cascade 
in accordance with the following equation: 


k 

Zpiw,= X + I 2 M 

i K7>0 Wj< 0 w;<0 65 

where k—nXn , p/are the pixel values, w, are the con- 
volver weights, and p/are the one’s complement of pixel 


2 

values which are either unsigned or two’s complement 
binary numbers, and weights are represented by abso- 
lute value binary numbers plus sign bits. 

The number N — 1 of complete scan lines that must be 
stored in order to cover successive n-by-n kernels, are 
stored in N— 1 series connected buffers, each buffer 
storing a number n of pixels where n is the total number 
of pixels in a scan line. The accumulation of successive 
products p/w/ is accomplished by an array of m one-bit 
full adders using the absolute values of the convolver 
weights, both positive and negative, and the one’s com- 
plement of the pixel value p / when it is to be multiplied 
by a negative convolver weight w /. The accumulated 
sum of p/| w/| and p,*| w/| are corrected for the negative 
convolver weights by adding to the sum of products 
Pi | w/| +p/|w,j the sum of |w,j for all convolver 
weights less than zero, i.e., for negative convolver 
weights. 

The novel features that are considered characteristic 
of this invention are set forth with particularity in the 
appended claims. The invention will best be understood 
from the following description when read in connection 
with the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a schematic diagram of the present inven- 
tion. 

FIG. 2 illustrates an example of th algorithm for 
forming the sum of products PW+S in the system of 
FIG. 1. 


DESCRIPTION OF PREFERRED 
EMBODIMENTS 

Referring to the drawings, FIG. 1 is a schematic 
diagram of a 3-by-3 convolver which produces the sum 
of products of nine fixed weights wi through W9 times 
the corresponding pixel values pi through p9 in a 3-by-3 
moving window of video data. Examination of the typi- 
cal weights used in low-level vision algorithms indicates 
that small positive or negative integers are most com- 
monly used, with the ratio of the smallest to the largets 
weight being usually less than 20. This means that a 
six-bit (including sign) weight is adequate, since this can 
repesent integer values from —31 to +31. 

To prevent the 12-bit data path from overflowing 
(because multiplying a 12-bit pixel by a 5-bit weight 
produces a result PW+S having 17 significant bits), it is 
also necessary to scale the output of the convolver in 
some variable way (since all large positive weights will 
produce a much larger result than a mix of small posi- 
tive and negative weights). Scaling is most easily ac- 
complished in hardware by shifting the data one or 
more bits, i.e., dividing by some power of two. If scaling 
were not included, adding nine of these 17-bit quantities 
PW+S together can produce a result having as 21 
significant bits. Since only 12 of these bits can be output 
to maintain a constant data path throughout the pipe- 
line, it would be excessive to compute the result accu- 
rately to 21 bits. However, somewhat more than 12 bits 
must be retained in intermediate stages of the con- 
volver, since it is common to take derivatives of heavily 
smoothed data, which involves subtracting quantities 
which are nearly equal. To preserve 12 significant bits 
of result when subtracting quantities differing only by 
10%, or so, requires a 16-bit internal data path. To 
ensure validity of the least significant bit of the output, 
an additional bit of low significance is also needed inter- 
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nally to the convolver. Thus, the convolver is provided 
with a 17-bit parallel data path between pipelined 
blocks for the results PW+S. 

Scaling of the 17-bit result , is accomplished in two 
stages: the input pixel values to the convolver may be 5 
shifted down in significance, allowing mor room for 
carry overflow when large positive weights are used, 
and the 12-bit output actually taken from the 17-bit 
convolver data output may be shifted up to allow for 
cancellation when subtracting nearly equal quantities. io 
As discussed above, adding the products of nine 12-by-5 
multiplications can produce a 21 -bit result. However, 
having all nine weights near the maximum value of 31 is 
very unlikely, since they could all be divided by two 
and the result scaled to produce nearly identical results. 15 
Thus, it is reasonable to assume that the sum of the nine 
weights can always be kept to somewhat under the 
maximum 279 (9-by-31). If the sum of the weights is 
kept under 256 (9% less than 279), overflow into the 
21st bit can be avoided. This means that shifting the 20 
input pixel values down in significance by up to three 
bits permits the 17-bit data path to accommodate the 
most significant bits of the 20-bit result. As a conse- 
quence, using the convolver of FIG. 1 calls for a pro- 
grammable shift of from zero to three bits in the input 25 
data using external data input select switches as a scaler 
A. The shift of the output 12-bit data path with respect 
to the internal 17-bit data path is similarly programma- 
ble from zero to three using a scaler B, so that when 
subtracting nearly equal quantities more significant bits 30 
are preserved. 

The arithmetic units blocks 10, are preferably imple- 
mented with custom VLSI circuit chips. The two com- 
plete scan lines that must be stored in order to cover the 
3-by-3 window are stored in conventional line buffers 35 
(memory chips) comprised of N— 3 pixel delays 11 and 
12, each with three pixel delay elements in cascade for 
a total of N pixel delays, and three pixel delay elements 
which precede the N— 3 pixel delay 11 to store the 
pixels pi through p9 in a moving window array as fol- aq 
lows: 

P3; P 2 ; pt 

P6; ps; P4 

P9; ps; P7 

45 

It is highly desirable to utilize VLSI technology for 
implementation of the convolver (except for the line 
buffers), including registers 13, 14 and 15 for the three 
sets of weights W1-W3, W4-W6 and W7-W9. Note that the 
weights are represented by five bits of absolute value 50 
(magnitude) plus a sign bit. The sign bits are stored in 
registers 13a, 14a and 15a which are in reality part of 
registers 13, 14 and 15, but shown here separate because 
they are so separated by the convolver algorithm. Cus- 
tom VLSI implementation allows the 12-by-5 bit arith- 55 
metic units 10 to be implemented directly within an 
internal 17-bit data path. Moreover, a custom VLSI 
circuit is easiest to design when the circuit to be imple- 
mented is a regular, repeated structure, as in this case of 
pipelined arithmetic units 10. 60 

Accumulation of successive multiplications can be 
accomplished most straight forwardly by a multiplica- 
tion algorithm of repeated add and shift operations, as 
illustrated in FIG. 2 using a simple example of a four-bit 
pixel as the multiplicand 1010 and a four-bit weight as 65 
the multiplier 1101. A nine-bit quantity S equal to 
0001001 10, the sum of absolute values for all negative 
weights (w/<0) or the result PW+*S from the previous 


4 

arithmetic unit, is assumed as an input to an arithmetic 
unit 10. The incoming S is entered in the accumulator of 
the arithmetic unit as a first step. Then the least signifi- 
cant bit of the multiplier is inspected. Because it is a bit 

1, the next step is to add the multiplicand in the four 
least significant bit positions of the accumulator. In the 
third step, the next least significant bit of the multiplier 
is inspected. Because it is a bit 0, zero is effectively 
added by doing nothing except effectively shifting the 
multiplicand before going into step number 4. Then the 
third multiplier bit is inspected. Because it is a bit 1, the 
multiplicand is effectively shifted, and added to the 
content of the accumulator. The process is repeated in 
step 5 for the last bit of the multiplier which, upon 
inspection after a shift, causes the multiplicand to be 
added to the partial products in the accumulator. The 
final product PW+S appears in the accumulator after 
the last step. 

This algorithm for the arithmetic units 10 can be 
implemented for a 17-bit pixel and 5-bit weight (abso- 
lute value) with a 17-by-5 array of one-bit full adder 
circuits. (Note that in the preferred implementation 
illustrated in the simple example with reference to FIG. 

2, the multiplicand is shifted only in the sense of it being 
added in a position of one binary bit higher significance 
in each step using selecting gates to the inputs of the 
adder, rather than actually shifting the multiplier in the 
registers 13, 14 and 15.) Thus, nine 17-by-5 arrays, for a 
total of 765 full adders, are needed for the convolver. In 
the preferred embodiment, seventeen-bit latches are 
used at the output of each of the arithmetic units 10 to 
store the intermediate accumulated results PW + S. This 
allows each multiplication to take the full pixel scan 
time. Appropriate reduction in the line buffer delay N 
corrects for these delays. The multiplication steps 
needed for each pixel are synchronously performed 
during pixel scan time. 

Signed arithmetic is accomplished by the simple ex- 
pedient of complementing the pixel value via exclusive 
OR operators Gi through G9 (each consisting of a bank 
of 17 exclusive OR gates connected to a one pixel delay 
element D for a 3 X 3 array of data) prior to multiplica- 
tions by a negative weight. Since the one's complement 
of the pixel value plus one would produce the negative 
of that value in two's complement representation, the 
product of a negative weight and the pixel value is equal 
to the product of the absolute value of the weight (five 
bits without a sign bit), and the complement of the pixel 
value plus one. This is accomplished in the system of 
FIG. 1 by adding the absolute values of all negative 
weights together at the outset (since they are fixed at 
the time of programming), and storing that sum 


in a register 16. 

The convolver thus organized performs the function 
of the following equation: 

9 (1) 

2 pm = 2 pi wi + 2 (-Pi) | wi\ 

i H7>0 w/<0 

Using two’s complement arithmetic, (i.e., —pi=pi+\) 
this function becomes: 
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9 _ (2) 

2 PM = 2 pm + 2 (pi + 1 ) | Wi\ 

i w />0 hv <0 

However, instead of forming p/+ 1, the distributive law 5 
is applied yielding the following equation: 



2 

wv>0 


pm + 2 n pi\wt\ 

W,< o 


+ 2 Nl 

w /<0 



where p/ are the pixel values, w / are the convolver 
weights, and p/ are the one’s complement of the pixel 
values. In this way a uniform VLSI architecture can 
handle both positive and negative convolver weights. 15 
Note that the input pixel data to the convolver may be 
positive or negative, and that the final output of the 
convolver may also be positive or negative. In a partic- 
ular unit, all bits of a pixel value are complemented 
when it is to be multiplied by a negative weight ex- 20 
pressed in the form of absolute binary value plus a sign 
bit. Consequently, the multiplicand will have all l’s left 
of the most significant bit to fill the 17-bit word for the 
scaled pixel weight (if it was initially positive, or was 
negative with a positive weight), and so the product 25 
PW will be negative for a negative weight, but the 
output PW+S may be positive or negative depending 
on the value of S and whether it is positive or negative. 

In that regard, it should be noted that the correction 
value 30 


s = 


2 


W/CO 
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values are either positive or two’s complement 
binary numbers and the weights are positive or 
negative numbers expressed as a binary number of 
magnitude and a sign bit, registers for storing k 
convolver weights, wi through w*, 
means responsive to a sign bit of negative weights for 
forming the one’s complement of the pixel values 
to be multiplied by negative weights, and 
means for forming the sum of products in each oper- 
ating unit in accordance with the equation PW -f S, 
where the product PW is formed by repeated shift 
and add of pixel value P under control of a weight 
value W, where S is a previous sum of products 
PW formed in accordance with the equation 

£ 

2 prn = 2 Pi\wi\ + 2 Pi\wi\ + 2 [w/| 

/ W7>0 hv< 0 wv<0 

where p, are the pixel values and pi are the one’s 
complement of pixel values, and where S for the 
first arithmetic unit forming the sum of products 
PW-bS is the term 


3. A system as defined in claim 2 comprising 
registers for storing the weights, including sign bits, 
and 

an additional register for storing the sum of the abso- 
lute value of all negative weights formed at the 
outset as said term 


added initially is always a positive number because it is 
the sum of the absolute value of all negative weights 35 
(five bits absolute without the sign bit). 

Although particular embodiments of the invention 
have been described and illustrated herein, it is recog- 
nized that modifications and variations may readily 
occur to those skilled in the art. Consequently, it is 40 
intended that the claims be interpreted to cover such 
modifications and variations. 

What is claimed is: 

1. A system utilizing nXn binary arithmetic units 
connected in cascade from multiplying binary pixel 45 
values pi by binary weights w/, where the pixel values 
are either positive or two’s complement and the weights 
are expressed by an absolute value binary number and 
sign bit, comprising 

means for storing an array of n-by-n pixel values of 50 
raster scanned video data in a moving window of n 
successive pixels from n successive rasters, 

means for coupling said array of n-by-n pixel values 
,to said nXn binary arithmetic units for multiplica- 
tion of positive binary pixel values by correspond- 55 
mg positive weights and multiplication of the one’s 
complement of binary pixel values by correspond- 
ing negative weights, and 

means for adding to the total sum produced by said 
arithmetic units connected in cascade the sum of 60 
the absolute value of all negative weights. 

2. A system comprised 

of a number k of pipelined arithmetic operating units 
and N— 1 buffers connected in series for storing 
values of two rasters of pixels, plus means for stor- 65 
ing the values of n pixels in series ahead of said 
buffers, while convolving n-by-n kernels in succes- 
sion, where k equals n times n, and where the pixel 


to be added to the convolver output as a correction 
for an error produced in multiplying pixel values 
by negative weights by addition of the one’s com- 
plement of the negative weights. 

4. A system as defined in claim 3 where n equals 3 for 
convolving 3 by 3 kernels. 

5. A system for forming the sum of products in the 
processing of video data comprising 

nine binary arithmetic units connected in cascade for 
multiplying binary pixel values pi which are posi- 
tive or two’s complement binary numbers by 
weights Wf which are positive or two’s complement 
binary numbers, 

means for storing said weights, including the sign bits 
of said weights, 

means for forming the one’s complement of a pixel 
value p/ to be multiplied by a negative weight w ,* 
under control of the sign bit of the negative weight 
w 1 , and 

means for adding as a correction the sum of the abso- 
lute values of all the negative weights for each 3x3 
kernel. 

6. A system as defined in claim 5 wherein said correc- 
tion sum is precomputed, and 

including means for storing said precomputed correc- 
tion sum as a value to be added to the product of 
the first arithmetic unit. 

7. A system comprised of 

N— 1 buffers for storing N— 1 rasters of pxiel values 
for an n-by-n kernel, 

means for carrying out necessary multiplications and 
additions in arithmetic operating units connected in 
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cascade in accordance with the following equation: 


2 pm = 2 Pi\ Wi\ + 2 3/1 Wj\ + 2 |w/| 5 

i w/> 0 w/<0 vv/<0 

where k=nXn, p / are the pixel values, w/are the 
system weights, and p/are the one’s complement of 
pixel values which are either unsigned or two’s ^ 
complement binary numbers, and weights are rep- 
resented by absolute value binary numbers plus 
sign bits, 

means for storing the number N — 1 of complete ras- 
ters that must be processed in order to cover sue- 
cessive n-by-n kernels consisting of N— 1 series 1 
connected buffers, each buffer storing a number n 
of pixels where n is the total number of pixels in a 
raster, 

means for accumulating successive products p/w / 2Q 
consisting of an array of m one-bit full adders using 
the absolute values of the system weights, both 
positive and negative, and the one’s complement of 
the pixel value p/ when it is to be multiplied by a 
negative weight w /, and 25 

means for correcting the accumulated sums of p/f w/| 
and p/|w/| for negative weights by adding to the 
sum of products p/|w/| +p/|w/| the sum of the 
absolute value w / for all negative weights. 

8. A method for forming the sum of products in the 
processing of video data in a kernel of nXn pixels by 
multiplying binary pixel values p/by binary weights w /, 
where the pixel values are either positive or two’s com- 
plement and the weights are expressed by an absolute 


value binary number and sign bit, comprising the steps 
of 


storing the array of n-by-n pixel values of raster 
scanned video data in a moving window of n suc- 
cessive pixels from n successive rasters, 
multiplying positive binary pixel values by corre- 
sponding positive weights and multiplying the 
one’s complement of binary pixel values by corre- 
sponding negative weights, 
forming the sum of products produced by said multi- 
plying steps and 

adding to the sum of products the sum of the absolute 
value of all negative weights. 

9. A method for forming the sum of products in the 
processing of video data in a kernel of 3 X 3 pixels by 
multiplying binary pixel values p,* which are either posi- 
tive or two’s complement binary numbers by weights 
w i which are positive or two’s complement, comprising 
the steps of 

storing said weights including the sign bits of said 
weights, 

forming the one’s complement of a pixel value p/to be 
multiplied by a negative weight w /, and 
adding as a correction the sum of the absolute values 
of all the negative weights for each 3x3 kernel. 

10. A method for forming the sum of products in the 
processing of video data in a 3-by-3 kernel as defined in 
claim 9, wherein said correction sum is precomputed, 
and 

including the step of storing said precomputed cor- 
rection sum as a value to be added to said sum of 
products. 

* * * * * 
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